Path: blob/master/Part 7 - Natural Language Processing/[Python] Natural Language Processing.ipynb
1002 views
Kernel: Python 3
Natural Language Processing
Data Preprocessing
In [1]:
In [2]:
In [3]:
Out[3]:
In [4]:
Out[4]:
1000
Cleaning the texts
In [5]:
In [6]:
Out[6]:
In [7]:
Out[7]:
['wow love place',
'crust good',
'tasti textur nasti',
'stop late may bank holiday rick steve recommend love',
'select menu great price',
'get angri want damn pho',
'honeslti tast fresh',
'potato like rubber could tell made ahead time kept warmer',
'fri great',
'great touch']
Creating the Bag of Words model
In [8]:
In [9]:
Out[9]:
(1000, 1500)
In [10]:
In [11]:
Out[11]:
array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1])
Splitting the dataset into the Training set and Test set
In [12]:
Fitting Naive Bayes to the Training set
In [13]:
Out[13]:
GaussianNB(priors=None)
Predicting the Test set results
In [14]:
Making the Confusion Matrix
In [15]:
Out[15]:
array([[ 66, 62],
[ 18, 104]])
Homework
1. Run the other classification models we made in Part 3 - Classification, other than the one we used in the last tutorial.
Decision Tree
In [16]:
Out[16]:
array([[94, 34],
[50, 72]])
Random Forest Classification
In [17]:
Out[17]:
array([[113, 15],
[ 56, 66]])
2. Evaluate the performance of each of these models. Try to beat the Accuracy obtained in the tutorial. But remember, Accuracy is not enough, so you should also look at other performance metrics like Precision (measuring exactness), Recall (measuring completeness) and the F1 Score (compromise between Precision and Recall). Please find below these metrics formulas (TP = # True Positives, TN = # True Negatives, FP = # False Positives, FN = # False Negatives):
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 Score = 2 * Precision * Recall / (Precision + Recall)
Accuracy, Precision, Recall, F1 Score of Naive Bayes
In [18]:
Out[18]:
Accuracy of Naive Bayes: 0.68
Precision of Naive Bayes: 0.626506024096
Recall of Naive Bayes: 0.852459016393
F1 Score of Naive Bayes: 0.722222222222
Accuracy, Precision, Recall, F1 Score of Decision Tree
In [19]:
Out[19]:
Accuracy of Decision Tree: 0.664
Precision of Decision Tree: 0.679245283019
Recall of Decision Tree: 0.590163934426
F1 Score of Decision Tree: 0.631578947368
Accuracy, Precision, Recall, F1 Score of Random Forest
In [20]:
Out[20]:
Accuracy of Random Forest: 0.716
Precision of Random Forest: 0.814814814815
Recall of Random Forest: 0.540983606557
F1 Score of Random Forest: 0.650246305419
In [ ]: